A Survey of Join Processing in Data Streams
نویسندگان
چکیده
1. Introduction Given the fundamental role played by joins in querying relational databases, it is not surprising that stream join has also been the focus of much research on streams. Recall that relational (theta) join between two non-streaming relations R1 and R2, denoted RlweR2, returns thesetofallpairs (rl, r2), whererl E R1, 7-2 E R2, and the join condition 8(rl, r2) evaluates to true. A straightforward extension of join to streams gives the following semantics (in rough terms): At any time t, the set of output tuples generated thus far by the join between two streams S1 and S2 should be the same as the result of the relational (non-streaming) join between the sets of input tuples that have arrived thus far in S1 and sz. Stream join is a fbndamental operation for relating information from different streams. For example, given two stream of packets seen by network monitors placed at two routers, we can join the streams on packet ids to identify those packets that flowed through both routers, and compute the time it took for each such packet to reach the other router. As another example, an online auction system may generate two event streams: One signals opening of auctions and the other contains bids on the open auctions. A stream join is needed to relate bids with the corresponding open-auction events. As a third example, which involves a non-equality join, consider two data streams that arise in monitoring a cluster machine room, where one stream contains load information collected from different machines, and the other stream contains temperature readings from various sensors in the room. Using a stream join, we can look for possible correlations between loads on machines and temperatures at different locations
منابع مشابه
Twig'n Join: Progressive Query Processing of Multiple XML Streams
We propose a practical approach to the progressive processing of (FWR) XQuery queries on multiple XML streams, called Twig’n Join (or TnJ). The query is decomposed into a query plan combining several twig queries on the individual streams, followed by a multi-way join and a final twig query. The processing is itself accordingly decomposed into three pipelined stages progressively producing stre...
متن کاملGreedyDual-Join: Locality-Aware Buffer Management for Approximate Join Processing Over Data Streams
We investigate adaptive buffer management techniques for approximate evaluation of sliding window joins over multiple data streams. In many applications, data stream processing systems have limited memory or have to deal with very high speed data streams. In both cases, computing the exact results of joins between these streams may not be feasible, mainly because the buffers used to compute the...
متن کاملAdaptive optimization of join trees for multi-join queries over sensor streams
Data processing applications for sensor streams have to deal with multiple continuous data streams with inputs arriving at highly variable and unpredictable rates from various sources. These applications perform various operations (e.g. filter, aggregate, join etc) on incoming data streams in real-time according to predefined queries or rules. Since the data rate and data distribution fluctuate...
متن کاملCausality Join Query Processing for Data Streams via a Spatiotemporal Sliding Window
Data streams collected from sensors contain a large volume of useful information including causal relationships. Causality join query processing involves retrieving a set of pairs (cause, effect) from streams of data. However, some causal pairs may be omitted from the query result, due to the delay between sensors and the data stream management system, and the limited size of the sliding window...
متن کاملA Stream Database Server for Sensor Applications
We present a framework for stream data processing that incorporates a stream database server as a fundamental component. The server operates as the stream control interface between arrays of distributed data stream sources and end-user clients that access and analyze the streams. The underlying framework provides novel stream management and query processing mechanisms to support the online acqu...
متن کامل